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ABSTRACT 

Tal-effector nucleases (TALENs) are engineered 
proteins that can stimulate precise genome editing 
through specific DNA double-strand breaks. Sickle 
cell disease and p-thalassemia are common genetic 
disorders caused by mutations in p-globin, and we 
engineered a pair of highly active TALENs that 
induce modification of 54% of human p-globin 
alleles near the site of the sickle mutation. These 
TALENS stimulate targeted integration of thera- 
peutic, full-length beta-globin cDNA to the endogen- 
ous p-globin locus in 19% of cells prior to selection 
as quantified by single molecule real-time 
sequencing. We also developed highly active 
TALENs to human /-globin, a pharmacologic target 
in sickle cell disease therapy. Using the p-globin and 
y-globin TALENs, we generated cell lines that 
express GFP under the control of the endogenous 
p-globin promoter and tdTomato under the control 
of the endogenous y-globin promoter. With these 
fluorescent reporter cell lines, we screened a 
library of small molecule compounds for their differ- 
ential effect on the transcriptional activity of the 
endogenous p- and y-globin genes and identified 
several that preferentially upregulate y-globin 
expression. 

INTRODUCTION 

Sickle cell disease is the most common monogenic disease 
worldwide and is caused by a single point mutation in the 
p-globin gene. Painful clinical symptoms begin shortly after 
birth as mutated p-globin subunits replace non-defective 
y-globin chains in the predominant form of hemoglobin. 
Current pharmacological treatment with hydroxyurea par- 
tially reverses this globin switching by increasing the 
production of y-globin (1,2). This has led to broad interest 
in developing other compounds and discovering new 



mechanisms that preferentially upregulate y-globin (2-5), 
and also in developing methods to study globin regulation 
(6,7). Analyses of differential expression of P- and y-globin 
generally have been limited to hemoglobin electrophoresis 
or qRT-PCR, but recent reports have described a method of 
using the expression of fluorescent molecules driven by the 
p- and y-globin promoters as a readout of differential globin 
regulation. In those studies, the authors integrated into the 
genome a bacterial artificial chromosome containing the 
entire 200 kb P-globin locus (which includes both P-globin 
and y-globin among other genes), modified such that the 
p- and y-globin promoters drive expression of fluorescent 
proteins (6,7). The integration of the complete genomic 
locus presumably maintains much of the physiologically 
relevant regulation of expression, but it does not allow for 
the direct analysis of the endogenous locus and is con- 
founded by the fact that integration is in a random 
genomic location and that some cells gain multiple copies 
of the BAC. In addition, a BAC-based strategy creates a 
system in which the globin locus is triploid rather than 
diploid and this change may also affect the regulatory 
dynamics. Alternatively, direct modification of the endogen- 
ous p- and y-globin loci eliminates those confounding 
variables. 

Endogenous genomic loci can be precisely altered using 
engineered zinc finger nucleases (ZFNs) (8-11) and Tal- 
effector nucleases (TALENs) (12-14). ZFNs and 
TALENs are comprised of a specifically engineered 
DNA binding domain fused to the Fokl endonuclease 
domain. Binding of a pair of ZFNs or TALENs to con- 
tiguous sites leads to the dimerization of the Fokl domain, 
resulting in a targeted DNA double-strand break. Repair 
of the break can proceed by mutagenic non-homologous 
end joining or by high-fidelity homologous recombination 
with a homologous DNA donor template. Compared to 
ZFNs, TALENs seem to cause lower levels of cytotoxicity 
(15). Their recognition domain is characterized by 
repeated arrays of 34 conserved amino acids, except in 
positions 12 and 13. These two amino acids comprise 
the repeat variable domain (RVD), which contacts the 
DNA and provides the nucleotide recognition specificity 
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of each repeat array (16,17). Unlike the other DNA bases 
which each show strong preference for a single RVD, 
guanine can be recognized by at least two RVDs with 
different binding characteristics. The asparagine-aspara- 
gine (NN) RVD can form a high-affinity hydrogen bond 
with guanine, but is not specific because it can also 
hydrogen bond with adenine (18,19). Conversely, the as- 
paragine-lysine (NK) RVD seems to be more specific for 
guanine (13) but is less commonly found in naturally 
occurring TAL-effector proteins (17). 

Recent reports have described the development and use 
of P-globin ZFNs to correct the sickle mutation in human 
iPS cells. The low rates of confirmed targeting described in 
these studies (1/300 (20) and 28/286 (21) drug resistant 
clones were targeted) could be increased by improving 
the efficiency and toxicity profile of the engineered nucle- 
ases. Here, we used highly active and minimally toxic 
P-globin TALENs to stimulate homologous recombin- 
ation of therapeutic p-globin cDNA to the endogenous 
P-globin locus in 19% of cells prior to selection. To 
analyse the efficiency of both the cutting by the 
TALENs and the rate of targeted integration, we 
employed a rapid, accurate and economical deep 
sequencing method known as single molecule real time 
(SMRT) sequencing (22). We then describe a new 
method to generate reporter cells that express fluorescent 
proteins from endogenous genomic promoters. By using 
TALENs to target a promoterless GFP in-frame to the 
endogenous P-globin ATG start site and a promoterless 
tdTomato in-frame to the endogenous y-globin ATG start 
site, we generated a robust endogenous reporter in the 
context of a common genetic disease. Finally, because 
y-globin upregulation is therapeutic in sickle cell disease, 
we used these fluorescent reporter cells to screen small 
molecule compounds that preferentially upregulate 
y-globin expression compared to P-globin. 

MATERIALS AND METHODS 

Cell lines and transfections 

K562 cells (ATCC) were maintained in RPMI 1640 
(Hyclone) supplemented with 10% bovine growth serum, 
100 units/ml penicillin, 100|ig/ml streptomycin and 2mM 
L-glutamine. K562s were transfected by nucleofection 
(Lonza) using program T-016 and a nucleofection buffer 
containing lOOmM KH,P0 4 , 15mM NaHC0 3 , 12mM 
MgCl 2 • 6 H 2 0, 8mM ATP, 2mM glucose, pH 7.4. 
HEK293T cells were maintained in DMEM (Cellgro) sup- 
plemented with 10% bovine growth serum, 100 units/ml 
penicillin, lOOug/ml streptomycin and 2mM L-glutamine. 
HEK293T cells were transfected either by calcium phos- 
phate or Lipofectamine 2000 (Invitrogen). 

Nuclease and targeting vector construction 

P-globin NK TALENs were synthesized (Genscript) using 
the A 152 N-terminal domain and the +63 C terminal 
domain previously described (13) and fused to the Fokl 
nuclease domain and cloned into pcDNA3.1 (Invitrogen). 
P-globin NN TALENs and y-globin NN TALENs were 
synthesized using a Golden Gate cloning strategy (23) and 



cloned with the same N- and C- termini and nuclease 
domain into pcDNA3.1. The P-globin ZFNs were 
synthesized using the B2H selection strategy previously 
described (24). The P-Ubc-GFP targeting vector was 
synthesized by PCR amplifying arms of homology from 
genomic DNA isolated from K562 cells using the primers 
in Supplementary Figure S10 and cloning a Ubc-GFP ex- 
pression cassette in between the arms. The P-in-frame- 
cDNA and P-in-frame-GFP targeting vectors were 
synthesized by overlap PCR to insert p-globin cDNA 
(OriGene) or GFP directly in-frame to the P-globin 
ATG start codon using the primers in Supplementary 
Figure S10. Silent mutations were introduced into the P- 
globin cDNA sequence at every sixth base pair between 
the nuclease cut site and the end of exon 1. The MGMT 
P140K drug selection cassette (generous gift from Dr Stan 
Gerson) was cloned into the targeting vector inside the 
arms of homology. The y-in-frame-tdTomato targeting 
vector was generated by genomic PCR of the 5' and 3' 
arms of homology using primers in Supplementary 
Figure S10. TdTomato was fused in-frame to the 
y-globin ATG start codon by overlap PCR. A neomycin 
phosphotransferase cassette was cloned in between the 
arms of homology. 

In vitro transcription of nucleases 

TALEN and ZFN mRNA was synthesized in vitro with 
the MEGAscript T7 kit (Ambion), polyadenylated in vitro 
with the poly(A) tailing kit (Ambion) and purified with the 
MEGAclear kit (Ambion) following the manufacturer's 
protocols. Two versions of mRNA were synthesized, 
using unmodified nucleotides or using pseudouridine- 
5'-triphosphate (Trilink) in place of UTP (25). 

SSA and toxicity assays 

A single-strand annealing (SSA) reporter was generated 
by disrupting the GFP gene by duplicating an internal 
42 bp region and separating the duplicated region with a 
72 bp fragment from the P-globin region containing the 
nuclease recognition sites. The SSA reporter and each 
nuclease were transfected by calcium phosphate into 
HEK293T cells and analysed on an Accuri C6 flow 
cytometer (Accuri) after 2 days. The toxicity assay was 
performed as previously described (24). Briefly, 
HEK293T cells were co-transfected by calcium phosphate 
with a pair of nucleases and a GFP expression plasmid. 
The cells were analysed by FACS for percent GFP positive 
on day 2 and day 6. The day 2/day 6 ratio was normalized 
to a non-toxic nuclease sample. 

Surveyor nuclease assay 

The Surveyor nuclease assay was performed as previously 
described (26). Briefly, 6 x 10 5 HEK293T cells were 
lipofected with 1.5 ug of each nuclease or 10 6 K562s 
were nucleofected with 2.5 ug of each nuclease unless 
otherwise indicated. After 3 days genomic DNA was 
isolated using the DNeasy kit (Qiagen) and the locus of 
interest was PCR amplified using the primers in 
Supplementary Figure S10 using Accuprime polymerase 
(Invitrogen). 200 ng of the PCR product was treated 
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with the Surveyor nuclease (Transgenomic) following 
the manufacturer's protocol. HEK293T cells were used 
to characterize the P-globin nucleases because of the 
presence of a naturally occurring SNP in K562s. 

SMRT sequencing and cDNA targeting 

PCR products prior to cutting by the Surveyor nuclease 
were prepared for SMRT sequencing following the manu- 
facturer's protocol (Pacific Biosciences). For the SMRT 
sequencing of the p-globin cDNA targeting events, 10 6 
K562s were nucleofected with 10 ug P-in-frame-cDNA 
and 1 ug each of PL4 and PR4 TALENs. Aliquots were 
removed after 3 days when the first round of selection was 
begun by adding 50 uM 06BG (Sigma) for 1 hour and 
then adding 40 uM BCNU (Sigma) for 1 hour before 
changing the media. Cells were allowed to recover for 7- 
10 days at which time another aliquot was harvested and 
another round of selection started. Genomic DNA was 
isolated (Qiagen) and the P-globin region was PCR 
amplified using primers in Supplementary Figure S10, 
which did not amplify random integrants. Primers with 
unique 3 bp tags were used in the PCR reactions from 
each time point, such that the samples could be 
combined and analysed in one SMRT sequencing 
reaction. Data were analysed using CLC Genomics 
Workbench software. 

Generation of fluorescent reporter cell lines 

10 6 K562 cells were nucleofected with 10 ug of the target- 
ing vector and 1 ug of each TALEN. P-globin-GFP cells 
were enriched by four rounds of selection with 06BG and 
BCNU and clones were established by limiting dilution. 
Y-globin-tdTomato cells were enriched by treatment with 
500ug/ml G418 and clones were established by limiting 
dilution. Targeting was confirmed by genomic PCR 
spanning the integration junctions using primers in 
Supplementary Figure S10. 

Quantitative real-time PCR 

Clonal populations of P-globin-GFP cells and y-globin- 
tdTomato cells which were targeted at one allele were 
treated for 4 days with 400 uM hydroxyurea and total 
mRNA was harvested by Trizol/chloroform extraction 
and purified on RNeasy columns (Qiagen). 1 ug total 
RNA was used to synthesize cDNA with the iScript 
cDNA kit (Bio-Rad) following the manufacturer's 
protocol. Biological triplicates were each assayed in trip- 
licate by qRT-PCR using SYBR green (Applied 
Biosystems) on a CFX384 real-time thermocycler (Bio- 
Rad) using the primers in Supplementary Figure S10 
using the following conditions: initial denaturation 
(3min at 95°C), 3-step PCR cycle (10 s at 95°C, 30 s at 
55°C, 5 s at 65°C, 40 cycles). PCR efficiency (between 
91% and 119%) was calculated using serial dilutions of 
template for each primer set. mRNA expression was 
quantified using the 2~ AACt method as compared to the 
housekeeping gene GAPDH. 



Screening globin-modulating compounds 

p-globin-GFP cells and y-globin-tdTomato cells were 
treated for 4 days with the indicated concentrations of 
GTP (Sigma), GDP (Sigma), GMP (Sigma), guanosine 
(Sigma), guanine (Sigma), cGMP (Sigma), Decitabine 
(Sigma), Sodium butyrate (Sigma), hydroxyurea (Sigma), 
zileuton (Sigma), hemin (Sigma), cisplatin (Santa Cruz 
Biotech), pomalidomide (Sigma), mithramycin (Fisher), 
apicidin (Sigma), cytarabine (Sigma) or phenylacetate 
(Sigma). Fluorescence was measured using an Accuri C6 
cytometer (Accuri), and was reported as the fold change in 
fluorescence intensity after 4 days. 

Statistical analysis 

Data from at least three samples were used to deter- 
mine significance by statistical analysis. Mean ± SD is 
reported. Statistical significance was determined by 
Student's /-test and P-values < 0.05 were considered 
significant. 

RESULTS 

Design and characterization of P- and y-globin TALENs 

To develop a system that robustly and rapidly reports on 
the activity of both the P-globin and y-globin loci, we 
designed a gene-targeting strategy using engineered nucle- 
ases. Recent reports have described low but significant 
levels of genome modification at the endogenous 
p-globin locus using ZFNs (20,21), and we first sought 
to improve the rate of gene targeting at the P-globin 
locus by designing custom TALENs to that site. First, 
we identified four putative left (PLl-pL4) and four right 
(PR1-PR4) TALEN binding sites near the sickle mutation 
in P-globin (Figure 1A and Supplementary Figure SI), 
and synthesized the eight individual TALENs using the 
NK RVD to bind each guanine. Notably, we made 
slight modifications of the final TALEN expression 
vector to include the N- and C-terminal TALEN trunca- 
tions that have been shown to be sufficient for optimal 
TALEN activity (13). In an extrachromosomal SSA 
assay, we identified six TALEN pairs that stimulated 
SSA at least 10-fold above background (Supplementary 
Figure S2). We then re-constructed the most active 
TALEN pair (PL4-NK/PR4-NK) to contain the NN 
RVD (PL4-NN/PR4-NN) using the Golden Gate 
cloning strategy previously described (23). To investigate 
their activities at the endogenous chromosomal P-globin 
locus, we used the Surveyor nuclease assay in HEK293T 
cells. 293T cells were used instead of hematopoietic K562 
cells because a SNP in one P-globin allele of K562s con- 
founded analysis in the Surveyor nuclease assay (data not 
shown). The NK versions modified up to 18% of alleles 
(Supplementary Figure S5A) and the NN TALENs 
modified 48% of alleles (Figure IB and Supplementary 
Figure S5A). As a comparison, we also used a modifica- 
tion of the 'oligomerized pool engineering' (OPEN) 
method to generate ZFNs to the P-globin locus (24,27). 
These ZFNs were made independently from the ones 
reported by Sebastiano et al. (21) but are designed to the 
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Figure 1. TALEN-mediated disruption at the human P-globin and y-globin loci. (A) Schematic of the pL4/pR4 TALEN binding site in the human 
P-globin gene. The ATG start codon and the sickle mutation are highlighted. (B) P-globin gene disruption in HEK293T cells. Arrows indicate specific 
Surveyor nuclease cleavage products (C) SMRT sequencing of P-globin alleles mutated by treatment with PL4/PR4 TALENs. The 11 most abundant 
mutated alleles are shown, and the frequency of each is indicated. TALEN binding sites are underlined. (A represents deletions, + represents 
insertions). (D) Schematic of the yL3/yR2 TALEN binding site in the human y-globin gene. The ATG start codon is highlighted. (E) y-globin 
gene disruption in K562 cells. (*, non-specific cleavage product). (F) SMRT sequencing of y-globin alleles mutated by treatment with yL3/yR2 
TALENs. The 1 1 most abundant mutated alleles are shown, and the frequency of each is indicated. TALEN binding sites are underlined. 



same target sequence and are very similar in the amino 
acid sequence of the alpha-helices that mediate DNA 
binding (Supplementary Figure S3). Although the ZFNs 
were much more cytotoxic than were the TALENs 
(Supplementary Figure S4), the ZFNs were also active, 
modifying up to 12% of P-globin alleles in the Survyeor 
nuclease assay (Supplementary Figure S5A). Interestingly, 
delivery of TALENs as mRNA did not increase the 
already high frequency of cutting, but delivery of the 
ZFNs as mRNA increased the signal from 12% to 35% 
(Supplementary Figure S5B). Importantly, the TALENs 
showed only 4% modification at the 5-globin locus 



(Supplementary Figure S5C), which has high sequence 
homology with P-globin (Supplementary Figure SI). 

To confirm the frequency of genome modification by 
PL4-NN/PR4-NN, we used SMRT sequencing, a rapid, 
high-throughput method for sequencing of the P-globin 
locus following TALEN treatment (22). SMRT 
sequencing allows for simultaneous analysis of up to 
30 000 sequences, as well as multiplexing various samples 
at once. Analysis of 14215 p-globin sequences revealed 
TALEN modification of 54% (Figure 1C). 

Next, to modify the endogenous y-globin locus, we 
designed and constructed three left (yLl- yL3) and two 
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right (yR2- yR3) NN TALENs that bind sequences 
near the ATG start codon of y-globin (Figure ID 
and Supplementary Figure S6). Because of the sequence 
identity between Ay-globin and Gy-globin these TALEN 
pairs do not distinguish the two loci. To measure the 
activity of the y-globin TALENs, we again used 
the Surveyor nuclease assay, which resulted in modifica- 
tion of up to 44% of y-globin alleles with the yL3/yR2 
pair (Figure IE). Two other TALEN pairs modified 
>30% of y-globin alleles (Supplementary Figure S7). 
Analysis of 14 790 y-globin SMRT sequences revealed a 
modification rate of 53% with yL3/yR2 (Figure IF). As 
expected, because of the lack of sequence homology 
between the P-globin and y-globin loci, the y-globin 
TALENs had no activity at the P-globin locus (data not 
shown). 

TALEN-mediated p-globin targeting by homologous 
recombination 

We then sought to determine at what frequency these 
highly active TALENs stimulated gene targeting by hom- 
ologous recombination (Figure 2A). First to target the 
P-globin locus, we designed a targeting vector with 
~ 1 kb arms of homology 5' and 3' of the TALEN cut 
site. In between the homology arms, we included a Ubc- 
GFP expression cassette that, upon successful homolo- 
gous recombination, would be stably integrated into the 
P-globin locus (Figure 2B, 'p-Ubc-GFP 1 targeting vector). 
Gene targeting was achieved by nucleofection of P-Ubc- 
GFP with PL4-NN and pR4-NN TALEN expression 
plasmids into erthyroleukemic K562 cells, and resulted 
in stable integration of Ubc-GFP in 19% of transfected 
cells (13% overall) compared to <1% in the absence of 
TALENs (Figure 2C). We then compared the activities of 
the NK and NN P-globin TALENs in the gene-targeting 
assay. In confirmation of the Surveyor assay data, the 
NN versions stimulated a significantly higher rate of 
targeted integration compared to the NK TALENs. 
Interestingly, when paired with PR4-NN, both PL4-NK 
and PL4-NN stimulate high rates of targeting (~20%). 
However, when paired with PR4-NK, PL4-NK resulted 
in 1.8% stable GFP expression, while PL4-NN led to 
4.5% stable GFP expression (Supplementary Figure S8). 
Despite high rates of modification in the Surveyor assay 
(Supplementary Figure S5A), the ZFNs did not stimulate 
targeting of the P-globin locus and targeted integration 
of the Ubc-GFP cassette could not be discriminated 
from background random integrants (Figure 2C and 
Supplementary Figure S8). In this direct comparison of 
ZFNs and TALENs designed to target nearly the same 
sequence (Supplementary Figure SI), we found that 
the TALENs were significantly better because of their 
greater cutting activity, significantly greater stimulation 
of targeting and their lower toxicity. These data also dem- 
onstrate better activity with TALENs using NN as the 
RVD to recognize guanine compared to NK but that 
NK TALENs can have excellent activity in the correct 
context. 



Targeting p-globin cDNA to the endogenous 
P-globin locus 

We next sought to target full-length P-globin cDNA to the 
endogenous P-globin ATG start site. In this way, en- 
dogenous p-globin regulatory elements would express 
p-globin from the cDNA instead of from the wild-type 
genomic sequence, a strategy that would be clinically 
relevant for both sickle cell disease and P-thalassemia. 
We modified the P-Ubc-GFP targeting vector, replacing 
the Ubc-GFP cassette with P-globin cDNA fused in-frame 
to the natural p-globin ATG start codon, already present 
in the 5' arm of homology (Figure 3 A, 'P-in-frame-cDNA' 
targeting vector). Also included in the P-in-frame-cDNA 
targeting vector was a drug selection cassette encoding 
a mutant form of methylguanine methyltransferase 
(MGMT P140K), which allowed for enrichment of 
targeted cells by treatment with the combination of 
06-benzylguanine (06BG) and carmustine (BCNU). 

To determine the frequency of targeting and the effi- 
ciency of drug selection, we again employed SMRT 
sequencing. First, we targeted K562s with the P-in- 
frame-cDNA targeting vector using pL4/pR4 TALENs. 
Then we pulsed the samples three times with 06BG and 
BCNU and harvested gDNA after each pulse. To amplify 
the P-globin locus, we used a forward primer that is 5' and 
outside the start of the 5' homology arm and a reverse 
primer in exon 2 of P-globin (Figure 3B). In this way, 
random integrants were not amplified. The presence of 
intron 1 in the wild-type genomic DNA sequence of this 
locus, and its absence in the targeted P-globin cDNA, 
allowed us to determine the ratio of targeted alleles to 
wild-type alleles after each pulse based on the length of 
the sequence, which could then be confirmed by the 
sequence content (Figure 3B). In the absence of drug se- 
lection, 8% of the alleles were targeted as determined by 
analysing the sequence of 1100 alleles in the TALEN- 
treated sample. The targeting frequency of 8% of alleles 
is consistent with the observed rate of P-Ubc-GFP target- 
ing in 19% of cells because there are three copies of the 
p-globin locus in K562 cells (Figure 2C). Pulsing the 
targeted cells with 06BG/BCNU up to three times 
resulted in the enrichment of targeted alleles such that 
they accounted for >60% of all sequenced alleles 
(Figure 3C). Since K562s are known to be aneuploid 
with three copies of the globin locus (28), a post-selection 
modified allele frequency of 60% is consistent with a 
highly purified population in which nearly 100% of cells 
are targeted at one or multiple P-globin alleles. 

Generation of fluorescent P- and y-globin reporters by 
endogenous locus tagging 

Next, we redesigned the p-Ubc-GFP targeting vector such 
that a promoterless GFP was fused in-frame to the 
p-globin ATG start codon (Figure 4A, 'P-in-frame-GFP' 
targeting vector). In this way, upon targeting to the en- 
dogenous P-globin locus, GFP would be driven by the 
endogenous P-globin promoter and would be subject to 
the regulatory elements controlling P-globin expression. 
We targeted the p-in-frame-GFP targeting vector to the 
p-globin locus, using either PL4/PR4 TALENs or ZFNs. 
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Under the same experimental conditions that resulted in 
targeting rates of 19% with the P-Ubc-GFP targeting 
vector, the P-in-frame-GFP targeting experiment resulted 
in a much lower percentage of GFP positive cells, which is 
attributable not to lower targeting frequencies but to the 
naturally low level of P-globin expression in K562s 
(28,29). That is, the level of GFP expression driven by 
the P-globin gene is too low to be seen above background 
in many cells. Nonetheless, in the presence of PL4 and 
PR4 TALENs, there was a significantly higher percentage 
of GFP positive cells than in control samples (Figure 4B, 
white bars). Selection with two pulses of 06BG and 
BCNU resulted in significant enrichment of GFP 
positive cells in the TALEN and ZFN samples 
compared to the targeting vector alone (Figure 4B). 
Notably, with up to four pulses with 06BG and BCNU, 
the overall percentage of GFP positive cells never 
increased above 20% (data not shown). We believe this 
is due to the low activity of the p-globin promoter in 
K562s. When we sorted for GFP positive cells from the 
TALEN sample, over the course of 2 weeks in culture, the 
population went from being >95% GFP positive to 



~15% (data not shown). A second sort again resulted in 
a population of >95% GFP positive cells that fell to 15%. 
We attributed this observation to the low level of p-globin 
expression in K562s, such that at any given time 15% of 
the population expressed GFP at a high enough level to be 
detected by flow cytometry. When we analysed 48 individ- 
ual clones from the drug selected TALEN sample, we 
observed three distinct patterns of GFP expression that 
we designated 'high,' 'medium' and 'low' (Figure 4C). 
To determine whether these clonal populations expressed 
GFP because of targeting to the P-globin locus, we used a 
genomic PCR assay spanning the junction of integration 
(Figure 4A, arrows). In this way, the presence of a PCR 
product indicates correct targeting to the endogenous P- 
globin locus. Indeed, 11 of 12 analysed clones showed 
targeted integration (Figure 4D). Interestingly, the one 
clone that did not produce a PCR product and thus was 
not targeted (clone #1) was a 'high' GFP expressing clone 
that had undergone random integration. Although we did 
not investigate the specific site of integration in this clone, 
based on its expression profile, it was likely near strong 
promoter elements that drive robust expression of the 
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transgene. Of the original 48 clones, only 4 had 'high' 
GFP expression, corresponding to the absence of targeting 
to the P-globin locus by junction PCR (data not shown). 
These data demonstrate that targeted cells show low levels 
of GFP expression because of the low activity of the 
P-globin promoter in K562s, and that high-expressing 
cells are paradoxically more likely to be the result of 
random integration. 

To develop a fluorescence-based reporter of the en- 
dogenous y-globin locus, we targeted tdTomato in-frame 
to the ATG start codon of y-globin, using a homologous 
targeting vector containing in-frame tdTomato followed 
by a neomycin drug resistance cassette (Figure 5A, 'y-in- 
frame-tdTomato'). Unlike P-globin, y-globin is highly 
expressed in K562 cells so the fluorescent readout from 
the targeted y-in-frame-tdTomato accurately reflected 
the overall integration rate despite the lack of an exogen- 
ous promoter. Co-transfection of y-in-frame-tdTomato 
with yL3/yR2 TALENs resulted in stable tdTomato ex- 
pression in 34% of transfected cells (23% overall), 
compared to <1% in samples without TALENs (Figure 
5B and C). Genomic PCR spanning the integration 
junction (Figure 5A, arrows) revealed the presence of a 
targeted band in samples treated with any of the three 
most active pairs of y-globin TALENs (Figure 5E, left). 

To create a dual-fluorescent reporter that expresses 
GFP from the endogenous P-globin locus and tdTomato 
from the endogenous y-globin locus (Figure 5D), we used 
the y-globin TALENs to target the y-in-frame-tdTomato 



vector to the y-globin locus in a previously targeted 
p-globin-GFP clone (Figure 5E, right). In this way, we 
generated three cell lines that report on the activity of 
endogenous globin promoters, the P-globin-GFP 
reporter, y-globin-tdTomato reporter and the P-globin- 
GFP/ y-globin-tdTomato dual reporter (Figure 5F). In 
the clones selected as reporter cell lines, expression from 
the fluorescent transgenes remained stable over the course 
of more than 4 months in culture. 

Using endogenous fluorescent reporter cells to screen 
globin-modulating compounds 

Next, we sought to establish these fluorescent reporter 
lines as tools that can be used to compare the globin- 
modulating activities of small molecule compounds. 
Hydroxyurea, is used clinically to increase the production 
of y-globin and it has been shown to upregulate y-globin 
in K562s (30,31). K562s treated for 4 days with 400 uM 
hydroxyurea showed a significant 62-fold increase in 
p-globin expression as measured by qRT-PCR. y-globin 
mRNA levels were even more elevated than P-globin tran- 
scripts after treatment with hydroxyurea, increasing 932- 
fold (Figure 6A). Next, we treated the p-globin-GFP 
reporter cells and the y-globin-tdTomato reporter cells 
with hydroxyurea and measured mean fluorescence inten- 
sity on day 4. GFP and tdTomato intensities were signifi- 
cantly higher compared to untreated cells, and the increase 
in tdTomato was significantly greater than the increase in 
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GFP, mirroring the changes in |3- and y-globin expression 
levels. These results show that the reporter cell lines can be 
used to rapidly, accurately and robustly measure the 
activity of the endogenous globin loci. 

To expand our analysis, we treated cells from the 
P-globin-GFP, y-globin-tdTomato and (3-globin-GFP/ 
y-globin-tdTomato cell lines with 5 concentrations of 17 
different compounds shown previously to modulate globin 
expression (Supplementary Figure S9). Of these, 10 signifi- 
cantly increased the expression of the endogenous y-globin 
locus, the most striking of which were guanine, guanosine, 
apicidin and hydroxyurea (Figure 6B). Similarly 10 com- 
pounds increased the expression of endogenous (3-globin, 
with the best inducers being guanosine, guanine and GMP 
(Figure 6C). The ideal pharmacological therapy for sickle 
cell disease is a drug that preferentially induces the pro- 
duction of y-globin compared to (3-globin. Therefore the 



most relevant analysis was of the ratio of induction of 
y- to (3-globin (Figure 6D). Compounds such as guanosine 
increased the expression of both y- and P-globin 
(Figure 6E). However, apicidin was a strong inducer of 
y-globin but had no activity at the p-globin promoter 
(Figure 6F). Importantly hydroxyurea, the clinical 
standard of care for induction of y-globin had one of 
the highest y/p induction ratios of all the screened com- 
pounds. In this way, we have established a system to 
robustly, rapidly and simultaneously report on the 
activity of the endogenous P- and y-globin promoters. 

DISCUSSION 

The emergence of the TALEN platform for engineering 
nucleases has made possible the rapid, open-source gener- 
ation of highly active genome editing proteins. TALENs 
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Figure 6. Using fluorescent reporter cell lines to screen globin-modulating compounds. (A) Effect of 400 uM hydroxyurea on P- and y-globin 
transcript levels (gray bars) and on GFP and tdTomato expression in targeted cell lines (white bars). Effect of drug treatment on (B) tdTomato 
expression in y-globin tdTomato cells, (C) GFP expression in [5-globin-GFP cells and (D) the ratio of tdTomato/GFP expression in y-globin- 
tdTomato and P-globin-GFP cells. FACS plots showing the effect on the expression of tdTomato and GFP from the y-globin and fi-globin loci 
of (E) guanosine, and (F) apicidin. Drug concentrations: 0.2% DMSO, 10 uM pomalidomide, 400 uM GDP, 5 uM hemin, 400 uM cGMP, 200 uM 
GTP, 200 nM mithramycin, 200 jiM zileuton, 4mM phenylacetate, 400 uM GMP, 10 uM decitabine, 1200 uM sodium butyrate, 5 uM cytarabine, 
4 uM cisplatin, 400 uM hydroxyurea, 400 nM apicidin, 400 uM guanosine and 400 uM guanine. 



have been used to cause site-specific gene disruption and 
gene targeting in yeast (32,33), plants (23), nematodes 
(34), zebrafish (35,36), rats (37) and human cells (12-14). 
A recent report described TALENs designed to human 
P-globin and showed 5% gene correction of a mutated 
GFP gene, which had been disrupted by the insertion of 
the P-globin sequence recognized by the TALENs, but did 
not describe their activity at the endogenous P-globin 



locus (38). The authors then the used P-globin TALENs 
and a transposon-based targeting strategy to correct the 
sickle mutation in patient-derived iPS cells (39). In a third 
report, Cradick et al. designed a CRISPR/Cas9 system to 
target p-globin and showed efficient modification of the 
endogenous locus but demonstrated significant off-target 
effects (40). Here, we synthesized and compared the 
activities of NN-TALENs, NK-TALENs and ZFNs 
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designed to the same genomic region in the human 
P-globin gene. We sought to induce a DNA double- 
strand break near the site of the sickle mutation, which 
limited the number of potential TALEN binding sites that 
adhered to the 5'T design rule (23). Although several of 
the TALENs without a 5'T did show nuclease activity at 
the P-globin sequence (Supplementary Figure S2), notably 
the most active pair PL4/J3R4 adhered to the 5'T rule. The 
highly active (3L4 TALEN monomer was designed such 
that the most C-terminal RVD binds to the sickle thymine 
and not the wild-type adenine. In spite of this 1 bp 
mismatch, PL4/PR4 were as active as the 'wild-type se- 
quence' PL4/PR4 TALENs in non-sickle cell lines (data 
not shown). The promiscuity of the TALEN pair designed 
to the sickle site for the wild-type sequence highlights the 
necessity of a thorough analysis of off-target effects of this 
nuclease pair. However, the activity at the wild-type 
sequence itself is not a concern in the potential therapeutic 
applications of this TALEN pair, as it would only be clin- 
ically used in patients with two mutated alleles. 

Using the PL4/PR4 TALENs, we targeted p-globin 
cDNA to the ATG start codon of the endogenous 
P-globin locus in human cells and used deep sequencing 
method to precisely detect rates of targeting. Then we de- 
veloped a TALEN-based locus tagging method to report 
on the activity of endogenous promoters by targeting GFP 
and tdTomato to the start codons of the endogenous 
P-globin and y-globin genes, respectively. Finally, we 
showed that our endogenously tagged reporter cells 
provide a rapid and facile method to analyse the globin- 
modulating activities of small molecule compounds. 

Our strategy of using SMRT sequencing to validate the 
activity of engineered nucleases as determined by the 
Surveyor nuclease assay allows for the analysis of many 
more sequences as compared to standard Sanger 
sequencing methods at a fraction of the cost of other 
deep sequencing platforms such as Illumina. We believe 
that using deep sequencing to determine cutting and tar- 
geting frequencies will be especially beneficial in primary 
cells such as CD34+ hematopoietic stem cells in which 
these rates are considerably lower compared to cell lines. 

Using TALENs to target full-length p-globin cDNA to 
the endogenous P-globin locus provides an alternate 
method to gene conversion of the sickle mutation using 
ZFNs as recently described (20,21). First, we showed con- 
siderably higher nuclease activity, using a TALEN 
platform that has been shown in side-by-side comparisons 
to be less toxic than ZFNs (15). In terms of toxicity, we 
showed using a previously described toxicity assay that the 
PL4/PR4 TALENs have considerably less cellular toxicity 
than both the P-globin ZFNs and the widely used CCR5 
ZFNs (Supplementary Figure S4). Analysis of the highly 
similar 8-globin locus revealed that the PL4/PR4 TALENs 
have minimal activity at that site (Supplementary Figure 
S5). True genome-wide, site-specific analysis for off-target 
activity is the focus of ongoing research. 

With regard to nuclease activity, in the P-in-frame-GFP 
targeting experiments that have low background signal 
because of the lack of exogenous promoter, we could 
detect targeted integration with the ZFNs after drug se- 
lection, showing that the ZFNs are capable of stimulating 



gene targeting at the P-globin locus. However, we were 
unable to detect targeting of the p-Ubc-GFP cassette 
with ZFNs at levels above background random integra- 
tion, presumably due to extremely low targeting and the 
toxicity of the ZFNs. In summary, our PL4/PR4 TALENs 
are more active and less toxic than OPEN-generated 
ZFNs in both genomic and functional assays. 

Another improvement in our strategy is that cDNA tar- 
geting would be therapeutic in both sickle cell disease, in 
which the causative mutation is at codon 6 of the P-globin 
gene, and P-thalassemia, in which causative mutations can 
occur throughout the length of the P-globin gene. The co- 
conversion of the sickle mutation with the downstream 
integration of a drug resistance cassette in the first 
intron as described (20,21) has been demonstrated to be 
less efficient in cases when there is homologous sequence 
in between the site of the conversion and the insertion of 
the selectable marker (41) such as the first exon of 
p-globin. Therefore, when we designed the P-in-frame- 
cDNA targeting vector we introduced silent mutations 
in every sixth nucleotide of the cDNA sequence between 
the nuclease cut site and the end of the first exon. By 
reducing the homology between the genomic locus and 
the cDNA, we shunted the repair to proceed via homolo- 
gous recombination with the 3' arm of homology (instead 
of with the short stretch of homology in exon 1 of the 
cDNA), ensuring that the drug selection cassette is also 
targeted to the locus. Unlike previous gene therapy trials 
that relied on random integration of P-globin and 
described the importance of P-globin introns on the ex- 
pression of the transgene (42), our next-generation 
approach directly modifies the endogenous locus 
preserving the extra-genic regulatory elements. Because 
the intervening sequence 2 (IVS2) has been shown to 
increase expression of p-globin cDNAs up to 500-fold 
(43), if we find that expression of the P-globin cDNA is 
too low in primary cells, we can test whether adding the 
IVS2 sequence to the construct to increase expression. In 
contrast to prior experiments testing the importance of 
IVS2, in our targeting experiments, IVS2 is retained at 
the locus and thus any regulatory effects it might have 
could be still be preserved. The effect that including this 
intronic sequence in the targeting construct would have on 
the efficiency of homologous recombination would also 
have to be tested. Additionally, we chose to use the 
MGMT P140K-based drug selection strategy because it 
is effective in vitro (44) and relies on the FDA-approved 
compounds 06BG and BCNU, which can enrich for 
targeted cells in vivo (45). 

Dozens of reports have analysed the effect of drugs on 
globin expression, primarily by analysing transcript levels 
by qRT-PCR, hemoglobin electrophoresis or benzidine 
staining (4,6,7,46-55). We established a method to 
generate fluorescent cell lines as accurate reporters of dif- 
ferential globin expression, and validated them by 
comparing the induction of P- and y-globin mRNA tran- 
scripts with the increase in GFP and tdTomato signal fol- 
lowing treatment with hydroxyurea. Although we did not 
directly control for potential cell cycle effects on fluores- 
cence alteration following drug treatment, analysis of 
transcript level by qRT-PCR validates that these 
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changes in fluorescence are due to the modification of gene 
expression. Then we used the fluorescent globin reporters 
in a mini drug screen to demonstrate their utility as tools 
to rapidly and accurately measure modulations in globin 
expression. Despite using compounds that have been pre- 
viously described to be y-globin inducers, we found that 
more than half of them also significantly increased expres- 
sion from the P-globin locus. One mechanism by which 
small molecule compounds affect globin expression is 
through the induction of erythroid differentiation (29). 
The degree to which these compounds affected the 
extent of differentiation of this cell line and the mechanism 
of globin-induction by these compounds was not directly 
investigated here. No matter the mechanism of induced 
globin expression, these data highlight the importance of 
simultaneously evaluating both (5- and y- globin expres- 
sion in globin-induction studies. This is the first proof-of- 
principle example of using precise genome engineering to 
rapidly and efficiently generate cell lines with endogenous 
promoter reporters, validating the output by direct com- 
parison to mRNA transcript levels, and then using the 
dual reporter cell line to screen for small molecules that 
differentially regulate two genes. In this way, we introduce 
a novel method to analyse endogenous promoter activity 
in the context of the most prevalent monogenic disease. 

Historically, many globin expression studies were done 
in K562 cells because they are ubiquitous erythroid pre- 
cursors and are highly amenable to in vitro experimenta- 
tion. However despite their widespread use, K562s are an 
imperfect system with which to study the intricacies of 
globin biology because of the non-physiological levels of 
(3-and y- globin expression. Indeed, our results similarly 
show a very high level of baseline tdTomato expression in 
the y-in-frame-tdTomato targeted cells with a low basal 
level of GFP expression from the p-in-frame-GFP cells. 
Despite this, we are able to demonstrate robust differential 
expression of P- and y-globin upon induction by various 
pharmacological compounds, including high y/p induction 
with hydroxyurea, the only compound clinically approved 
for this purpose. With these limitations in mind, we chose 
K562s as our model system because they can tolerate 
transfection of large amounts of DNA, allowing for opti- 
mization of the vital genome engineering aspects of this 
strategy. It is clear that alternative cell lines and ultimately 
primary erythroid progenitors are required to mechanis- 
tically describe and validate the methods of globin modu- 
lation that are suggested in this proof-of-principle work. 
As transfection methods of primary cells improve and 
with the discovery of potentially less toxic modified 
RNAs, we anticipate achieving biologically relevant 
levels of genome modification in these cells. 

Despite the limitations of K562s, there have been no 
fewer than 20 reports in the literature in the last year 
alone describing globin modulation in this cell line. Here, 
we describe a novel method to concurrently evaluate P- and 
y-globin expression, using compounds that have been pre- 
viously described to regulate globin expression. Having 
validated the effectiveness of this multi-fluorescent en- 
dogenous globin expression approach, we are now transi- 
tioning this work into a more biologically relevant cell line 
which we can use in an unbiased high-throughput drug 



screen to identify novel y-globin-specific inducers to be 
the next generation of pharmacologic therapy for patients 
with sickle cell disease. More generally, this strategy could 
be broadly applied to generate multi-color reporter cell 
lines to allow rapid screening for conditions and com- 
pounds that promote the activity of a particular pathway 
or determine cellular fate. 
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